Improving the self-adaptive voice activity detector for speaker verification using map adaptation and asymmetric tapers
نویسندگان
چکیده
This paper brings an improvement of voice activity detection, based on vector quantization and speech enhancement preprocessing (VQ-VAD) proposed recently, and applied to speaker verification system under noisy environment. VQ-VAD is based on computing the likelihood ratio on an utterance-by utterance basis from mel-frequency cepstral coefficients that train speech and non-speech models. Whereas the notion of speech and non-speech segments in speech signal is independent of the speaker. For this, a modified VQ-VAD technique is proposed in this paper, by creating two UBM’s for speech and non-speech models, trained from a long utterance-independence model. Then, an adaptation of UBM’s models to the short utterance of speaker is performed via MAP adaptation, instead of using VQ models. Mel-frequency cepstral coefficient’s were also extracted by using the recently proposed asymmetric tapers instead of the traditional Hamming windowing. Using the GMM–UBM as a baseline system for speaker verification, extensive simulation results were done by adding different noise levels to N. Asbai · M. Bengherabi Center for Development of Advanced Technologies (CDTA), Baba Hassen, Algeria N. Asbai e-mail: [email protected] M. Bengherabi e-mail: [email protected] N. Asbai (B) · A. Amrouche · Y. Aklouf Speech Com. & Signal Proc. Lab., Faculty of Electronics and Computer Sciences, USTHB, 16 111 Bab Ezzouar, Algeria e-mail: [email protected] A. Amrouche e-mail: [email protected] Y. Aklouf e-mail: [email protected] the clean TIMIT database, characterized by its short training and very short testing utterances. The obtained results show the superiority of the proposed GMM-MAP-VAD approach in adverse conditions. Furthermore a drastic reduction in the EER is observed when using asymmetric tapers.
منابع مشابه
On the use of asymmetric-shaped tapers for speaker verification using i-vectors
This paper presents asymmetric-shaped tapers (or windows) for speaker recognition. Symmetric tapers (e.g., hamming), having the linear phase property and longer time delay, are widely used for short-time analysis of speech signals. Since human speech perception is relatively insensitive to short-time phase distortion, the linearity constraint on phase can be removed without any adverse effects....
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملRobust voice activity detection for narrow-bandwidth speaker verification under adverse environments
We describe a voice activity detection algorithm which leads to significant improvement of a narrow-bandwidth speaker verification system under harsh environments. This algorithm is based on a time-scale feature which is extracted from wavelet subbands. A statistical quantile filtering technique is proposed to estimate an adaptive noise threshold. A hang-over scheme is then applied to bridge sh...
متن کاملA New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- I. J. Speech Technology
دوره 18 شماره
صفحات -
تاریخ انتشار 2015